Good afternoon, I try to import a csv file in Cassandra that is very long. It is dealing with food products: ingredients, nutrition, labels. It comes from Open Food Facts. It lists information about food products: ingredients, nutritional information, labels, etc. The majority of the data comes from crowdsourcing information. The file is on the open French public data platform data.gouv.fr
I try the following command with all the columns that I was able to compile with a python script:
cqlsh> COPY bouffe(code, url, creator, created_t, created_datetime, last_modified_t, last_modified_datetime, product_name, generic_name, quantity, packaging, packaging_tags, brands, brands_tags, categories, categories_tags, categories_fr, origins, origins_tags, manufacturing_places, manufacturing_places_tags, labels, labels_tags, labels_fr, emb_codes, emb_codes_tags, first_packaging_code_geo, cities, cities_tags, purchase_places, stores, countries, countries_tags, countries_fr, ingredients_text, allergens, allergens_fr, traces, traces_tags, traces_fr, serving_size, no_nutriments, additives_n, additives, additives_tags, additives_fr, ingredients_from_palm_oil_n, ingredients_from_palm_oil, ingredients_from_palm_oil_tags, ingredients_that_may_be_from_palm_oil_n, ingredients_that_may_be_from_palm_oil, ingredients_that_may_be_from_palm_oil_tags, nutrition_grade_uk, nutrition_grade_fr, pnns_groups_1, pnns_groups_2, states, states_tags, states_fr, main_category, main_category_fr, image_url, image_small_url, energy_100g, energy-from-fat_100g, fat_100g, saturated-fat_100g, butyric-acid_100g, caproic-acid_100g, caprylic-acid_100g, capric-acid_100g, lauric-acid_100g, myristic-acid_100g, palmitic-acid_100g, stearic-acid_100g, arachidic-acid_100g, behenic-acid_100g, lignoceric-acid_100g, cerotic-acid_100g, montanic-acid_100g, melissic-acid_100g, monounsaturated-fat_100g, polyunsaturated-fat_100g, omega-3-fat_100g, alpha-linolenic-acid_100g, eicosapentaenoic-acid_100g, docosahexaenoic-acid_100g, omega-6-fat_100g, linoleic-acid_100g, arachidonic-acid_100g, gamma-linolenic-acid_100g, dihomo-gamma-linolenic-acid_100g, omega-9-fat_100g, oleic-acid_100g, elaidic-acid_100g, gondoic-acid_100g, mead-acid_100g, erucic-acid_100g, nervonic-acid_100g, trans-fat_100g, cholesterol_100g, carbohydrates_100g, sugars_100g, sucrose_100g, glucose_100g, fructose_100g, lactose_100g, maltose_100g, maltodextrins_100g, starch_100g, polyols_100g, fiber_100g, proteins_100g, casein_100g, serum-proteins_100g, nucleotides_100g, salt_100g, sodium_100g, alcohol_100g, vitamin-a_100g, beta-carotene_100g, vitamin-d_100g, vitamin-e_100g, vitamin-k_100g, vitamin-c_100g, vitamin-b1_100g, vitamin-b2_100g, vitamin-pp_100g, vitamin-b6_100g, vitamin-b9_100g, folates_100g, vitamin-b12_100g, biotin_100g, pantothenic-acid_100g, silica_100g, bicarbonate_100g, potassium_100g, chloride_100g, calcium_100g, phosphorus_100g, iron_100g, magnesium_100g, zinc_100g, copper_100g, manganese_100g, fluoride_100g, selenium_100g, chromium_100g, molybdenum_100g, iodine_100g, caffeine_100g, taurine_100g, ph_100g, fruits-vegetables-nuts_100g, fruits-vegetables-nuts-estimate_100g, collagen-meat-protein-ratio_100g, cocoa_100g, chlorophyl_100g, carbon-footprint_100g, nutrition-score-fr_100g, nutrition-score-uk_100g, glycemic-index_100g, water-hardness_100g) FROM 'bouffe.csv' WITH HEADER = true;
But it does not work
- there are columns that are not below a column heading after
of
water-hardness_100g
:
- I think missing column types.
It would be:
create ColumnFamily Bouffe
(Code varchar PRIMARY KEY,
url varchar,
...
)
But how to do for 162 minimum columns without counting those that maybe should not be taken into account because without column headers. I thought about a Python script but I have to do it, I would get the names of each column before I paste them vartext. To do this:
CREATE TABLE emp(
emp_id int PRIMARY KEY,
emp_name text,
emp_city text,
emp_sal varint,
emp_phone varint
);
As shown in the tutorialspoint table creation example .
Can you help me?