I do not think that glob
is the best option to analyze folder contents. A utility (in my opinion) could be based on os.listdir()
, which returns a list with all the names of existing files and folders, so you can easily classify them according to their extension:
import os
from collections import defaultdict
clasificados = defaultdict(list)
for nombre in os.listdir():
if "." in nombre:
extension = nombre.split(".")[-1].lower()
else:
extension = ""
clasificados[extension].append(nombre)
for k, v in clasificados.items():
print("Extensión '{}': {} ficheros".format(k, len(v)))
Explanation We build a dictionary in which the keys will be the file extensions, and the values a list with the files that have that extension. The extension is extracted starting with split()
by the point (if there is one) and passing it to lowercase, and if there is no point, an empty extension is set. Once classified in that dictionary, it can be traversed to show how many elements there are of each extension. In my case when running it in my "Downloads" folder, something like this appears:
Extensión 'pdf': 27 ficheros
Extensión 'zip': 1 ficheros
Extensión 'xlsx': 1 ficheros
Extensión 'png': 5 ficheros
Extensión 'py': 1 ficheros
Extensión 'gif': 3 ficheros
Extensión '': 2 ficheros
etc...
Detect directories
As you have seen, the above does not work either to detect directories, since there is no way of knowing by name if something is a directory or not. But the os
module provides other ways to iterate through the contents of a folder.
Using os.scandir()
instead of getting file names, we get objects of type os.DirEntry
with a series of methods that allow us to obtain additional information about each element. One of these methods is .is_dir()
, which gives you True
if it is a directory, or .is_file()
that gives True
if it is a normal file . Another is .name
that gives us the name if we want to look at its extension.
Using this we can classify by extensions only those that are really files, and count aside those that are directories:
import os
from collections import defaultdict
clasificados = defaultdict(list)
directorios = []
for elemento in os.scandir():
nombre = elemento.name
if elemento.is_dir():
directorios.append(nombre)
else:
if "." in nombre:
extension = nombre.split(".")[-1].lower()
else:
extension = ""
clasificados[extension].append(nombre)
print("{} directorios".format(len(directorios)))
for d in directorios:
print("- {}".format(d))
print("{} ficheros".format(sum(len(caso) for caso in clasificados.values())))
for k, v in clasificados.items():
print("- '{}': {} ficheros".format(k, len(v)))
And the output is something like:
2 directorios
- Safari
- tmp
52 ficheros
- 'pdf': 27 ficheros
- 'zip': 1 ficheros
- 'xlsx': 1 ficheros
- 'png': 5 ficheros
- 'py': 1 ficheros
- 'gif': 3 ficheros
...etc