Follow-up to Questions from BTEP Reproducible R with Git class
Question 1:
Is Git a local software which manages the versions of a file or is it cloud-based?
- Git is a software that is installed on personal computer or on High Performance Computing clusters such as Biowulf at NIH (see below, replace user with user-specific Biowulf ID).
[user@biowulf ~]$ module whatis git
git/2.45.2 : Name => git
git/2.45.2 : Version => 2.45.2
git/2.45.2 : Category => version control
git/2.45.2 : Description => git dvcs
git/2.45.2 : URL => http://git-scm.com
- To install Git on personal computer, visit https://git-scm.com/downloads and download the one appropriate for your operating system. Windows users are also encouraged to install Git Bash. Contact your institutional computing help desk to request software to be installed on your government furnished computer. NCI scientists can install software on GFE Macs on their own but NCI affiliates using Windows should contact service.cancer.gov.
- Git can also be installed through package managers like Homebrew ([https://formulae.brew.sh/formula/git]{https://formulae.brew.sh/formula/git}) and Anaconda. See https://bioinformatics.ccr.cancer.gov/btep/getting-started-with-an-nih-anaconda-business-license/ to learn about accessing the NIH Anaconda Business license.
- We are aware that scientists maybe using cloud platforms such as Cancer Genomics Cloud for their work. Some of the cloud services enable launching of R Studio, terminals, and Jupyter Notebook. Check with the specific cloud provider about using Git.
Question 2:
What is an initial commit?
- An initial commit is the first time that the history of a script or scripts in a project is saved.
Question 3:
Can you reference the SHA in R code to compare between file versions? If so, what information can you obtain about the files?
- When versioning using Git in R Studio, you will likely not be directly using the SHA. However, you can perform Git versioning tasks on the command prompt if preferred. Please refer to [https://swcarpentry.github.io/git-novice/05-history.html] for some ways to use the SHA to compare between versions. Information obtained from comparing between versions include:
- The starting line in the file or script in which the comparison is being made.
- Number of lines in both versions of a file or script.
- What has been added or deleted between the two versions.
Question 4:
Is it possible to initiate version control in a pre-existing project to track future changes?
- Yes
Question 5:
Is it helpful to use Git Desktop?
- I think these can be helpful. There are a couple of desktops that I know of such as GitHub Desktop and Git Kraken.
Question 6:
For Biowulf, the Git path is /user/bin/git
. How do we port that to a central git for connectivity with other R users? Or, are we to create a group dir on Biowulf with cross user permissions added and each of us change the global options path to save and pull from there?
- I think having a GitHub repository that each group member can access would be the way to go. But you can also work in a shared group folder as well as long as it has the proper permissions.
Question 7:
Can Git be used for other languages such as Python or Matlab?
- Yes, Git can track other languages including Python (https://realpython.com/python-git-github-intro/) and Matlab (https://www.mathworks.com/help/matlab/matlab-git-source-control.html?s_tid=CRUX_lftnav).
Question 8:
If you have multiple contributors to a file, will Git track the changes based on contributor?
- Yes,it will track changes based on contributor. One way to do this is to share the project on GitHub and then everyone on your Team can pull it to their personal computer to work on it and share additions back on GitHub. If you were doing this Biowulf, I think just make sure your group has correct read/write permissions setup for the shared folder (contact Biowulf staff for trouble shooting since I haven't done a Git collaboration project on the cluster).
Question 9:
What is the difference between saving and committing.
- Saving just involves saving your work. For instance, suppose you are working on a Word document. You might save it at the end of the day on Thursday and then again on Friday, which is fine because you won't lose your work. But if on Monday, if a colleague asks you to pull up the version of the document from Thursday, then there is no way for you to do that because the work on Friday added to and overwrote the work from Thursday. In this case, in addition to saving, you would want to take a snapshot of how the document looked like on Thursday and this is what committing is, it creates a snapshot of script(s) in a project so you can easily refer to what is looks like at a given point in time.
Question 10:
Is pushing to GitHub the same as uploading and is pulling the same as downloading?
- I think these are good analogies.
Question 11:
Where should I install Git on my local computer and should I use Git desktop for version control.
- I think for PC, Git will install in the programs folder and Macs, it will install in the Applications. Git desktop will be useful for versioning (see response to Question 5).