It seems that Git already took into account the files added in .gitignore

Question

It seems that Git already took into account the files added in .gitignore

Navigation

#1 by (5 votes)

0

I do not want to add .csv files in my repositiorio github so I've written in .gitignore :

# Compiled source #
###################
*.csv

But I answered the terminal:

mike@mike-thinks:~/Data/on_2018_04_25_16_43_17$ git push origin master
Username for 'https://github.com': Antoinecomp
Password for 'https://[email protected]': 
Counting objects: 52, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (50/50), done.
Writing objects: 100% (52/52), 20.38 MiB | 2.06 MiB/s, done.
Total 52 (delta 14), reused 0 (delta 0)
remote: Resolving deltas: 100% (14/14), done.
remote: error: GH001: Large files detected. You may want to try Git Large File Storage - https://git-lfs.github.com.
remote: error: Trace: 12b8a656f736b8819c2b79c59ba5bace
remote: error: See http://git.io/iEPt8g for more information.
remote: error: File kis20180424140552.xml is 254.75 MB; this exceeds GitHub's file size limit of 100.00 MB
To https://github.com/antoinecomp/classificationTree.git
 ! [remote rejected] master -> master (pre-receive hook declined)
error: failed to push some refs to 'https://github.com/antoinecomp/classificationTree.git'

Then I tried to clean the cache but it never worked either:

mike@mike-thinks:~/Data/on_2018_04_25_16_43_17$ git rm --cached option
fatal: pathspec 'option' did not match any files

How to make git no longer consider that csv files are too large to store?

Update :

here is the repository:

mike@mike-thinks:~/Data/on_2018_04_25_16_43_17$ ls
AccreditationByHep.csv              kis20180424140552.xml
AccreditationByHepModified.csv      KISAIM.csv
AccreditationByHepModifiedTest.txt  KISCOURSE.csv
ACCREDITATION.csv                   LOCATION.csv
ACCREDITATIONTABLE.csv              logstash_AccreditationByHep.config
COMMON.csv                          Main.py
CONTINUATION.csv                    NHSNSS.csv
COURSELOCATION.csv                  NSS.csv
DEGREECLASS.csv                     __pycache__
Elastic.py                          README.md
EMPLOYMENT.csv                      readme.txt
ENTRY.csv                           SALARY.csv
GetCourses.py                       SBJ.csv
get-pip.py                          setServer.py
INSTITUTION.csv                     TARIFF.csv
JOBLIST.csv                         Trees.xlsx
JOBTYPE.csv                         UCASCOURSEID.csv

git csv

asked by ThePassenger 15.05.2018 в 17:33

source

1 answer

Select 3 tables mysql Wait a while in a function

score 5 · Accepted Answer

It is claiming you for the file kis20180424140552.xml , not for a CSV.

But going to the general level, if you have added large files to the version control, be these XML, CSV or whatever you want, by adding *.csv or *.xml to .gitignore you can de-version them with

git rm --cached <ruta relativa del archivo>

In your case apparently option is not a subdirectory in the root of the project and git rm can not find it. Maybe what you want to do is

git rm --cached subdirectorio/*.csv

And after that comittear the state where that file has been removed from version control (although it is still physically there, only it is ignored).

PEEEEERO ... , if that large file was once in your version control, it is still in the history, and the idea of GIT is that the project contains all references to all the files that once passed through version control . As much as you eliminate it from your HEAD, the references contain it and when synchronizing with Github they will try to upload that reference.

How to delete that file from the history forever and ever?

Well, first I would start by listing the files that weigh more than the Github limit (100MB)

git rev-list --objects --all \
| git cat-file --batch-check='%(objectsize) %(rest)' \
| awk '$1 >= 100 * 2^20' \  # Filtro lo que pesa más que 100 * 1MB
| sort --numeric-sort --key=1 \
| numfmt --field=1 --to=iec-i --suffix=B --padding=7 --round=nearest

This will deliver a list where it will appear, for example:

255MiB kis20180424140552.xml

So you need to delete that file from the story forever. Just in case, take a copy in another folder to return that file when the process finishes. in case it gets lost.

To delete that file from the story forever:

git filter-branch -f --index-filter 'git rm --cached --ignore-unmatch kis20180424140552.xml' HEAD

To remove all *.csv of history

git filter-branch -f --index-filter 'git rm --cached --ignore-unmatch *.csv' HEAD

But that file (or files) is still there, in your reflog , so the next step would be:

rm -rf .git/refs/original/ 
git reflog expire --all 
git gc --prune # opcionalmente puedes añadir --aggressive que es más radical

With which you delete the reference to the original, you expire your reflog and you execute the garbage collector

Finally, after running filter-branch your commit history will have changed. Basically he rewrote history to you as if those files had never existed. That means that the hashes of your local commits do not match the github hashes, so you'll have to push using --force

git push origin master --force

At the end of the procedure the files that you deleted from the story are still in your location, only now they are no longer subject to version control.

PD If at some point you created tags in your repo, and those tags contain the file, you will also have to clean the tags.