HIVE - Error: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement

3

Very good. The first thing I have to say is that I am a first-time user of Hive, so my mistake may be a beginner's mistake. My problem is this:

This is trying to create a table with five columns (game-clock, time, game-event, shot-clock, location) from the following test XML:

<sequences period=”5″>
<moment game-clock=”300.00″ time=”1433732215737″ game-event-id=”” shot-clock=”24.00″ locations=”-1,-1,96.95182,1.98648,5.75987;9,173004,45.54661,17.35545,0;9,338365,24.04722,25.67399,0;9,457611,46.95292,27.93478,0;9,468895,48.59834,33.96586,0;9,552336,33.73381,24.05929,0;5,214152,59.26872,24.12006,0;5,253997,45.71551,17.41071,0;5,457186,48.59834,33.96586,0;5,531447,78.09629,34.24688,0;5,552806,47.79678,22.8155,0″ /> 
<moment game-clock=”300.00″ time=”1433732215794″ game-event-id=”” shot-clock=”24.00″ locations=”-1,-1,97.79683,.89407,3.67626;9,173004,45.62283,17.34854,0;9,338365,24.04248,25.6784,0;9,457611,46.84978,27.8463,0;9,468895,48.52017,33.89189,0;9,552336,33.74064,24.03493,0;5,214152,59.27282,24.07895,0;5,253997,45.87101,17.38532,0;5,457186,48.52017,33.89189,0;5,531447,78.06394,34.2155,0;5,552806,47.8269,22.81393,0″ /> 
<moment game-clock=”300.00″ time=”1433732215829″ game-event-id=”” shot-clock=”24.00″ locations=”-1,-1,97.78946,.91006,3.68332;9,173004,45.61995,17.35703,0;9,338365,24.03815,25.68402,0;9,457611,46.71077,27.71191,0;9,468895,48.37095,33.77756,0;9,552336,33.74769,24.00829,0;5,214152,59.27627,24.06055,0;5,253997,46.00077,17.36555,0;5,457186,48.37095,33.77756,0;5,531447,78.0439,34.20521,0;5,552806,47.84297,22.83292,0″ /> 
<moment game-clock=”300.00″ time=”1433732215856″ game-event-id=”” shot-clock=”24.00″ locations=”-1,-1,97.73786,1.02206,3.73271;9,173004,45.57851,17.34979,0;9,338365,24.04207,25.61049,0;9,457611,46.63871,27.56226,0;9,468895,48.2033,33.7142,0;9,552336,33.75497,23.97935,0;5,214152,59.27906,24.06485,0;5,253997,46.10481,17.35141,0;5,457186,48.29748,33.63262,0;5,531447,78.03618,34.216,0;5,552806,47.84498,22.87247,0″ /> 
<moment game-clock=”300.00″ time=”1433732215905″ game-event-id=”” shot-clock=”24.00″ locations=”-1,-1,97.59781,1.32606,3.8668;9,173004,45.57865,17.34643,0;9,338365,24.04224,25.61058,0;9,457611,46.56615,27.44014,0;9,468895,48.01722,33.7018,0;9,552336,33.76247,23.94813,0;5,214152,59.27976,24.07223,0;5,253997,46.26668,17.38672,0;5,457186,48.29974,33.45708,0;5,531447,78.02931,34.2208,0;5,552806,47.86752,22.85019,0 ″/>

For this, I'm using the SerDe regex to create the table. The CREATE TABLE that I am using is the following:

CREATE EXTERNAL TABLE sample_xml (
 gameClock string,
 time string,
 gameEventId string,
 shotClock string, 
 locations string 
) 
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' 
WITH SERDEPROPERTIES ('input.regex'='(\d+.\d+)″\stime=”(\d+)″\sgame-event-id=”[0-9]*”\sshot-clock=”(\d+.\d+)″\slocations=(”(-?\d,-?\d+,\d+\.\d+,\d*\.\d+,\d\.?\d*;?)+\s?″)') 
LOCATION '/tmp/pruebas/xml';

When executing the CREATE TABLE I receive the following error:

org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: ParseException line 9:37 mismatched input '(' expecting StringLiteral near '=' in specifying key/value property

Any idea what I'm doing wrong?

    
asked by Jonzz 08.03.2017 в 17:23
source

1 answer

0
  

DO NOT RESOLVE THE PROBLEM . As expressed by the author, this answer does not solve the problem he is having (he still shows the same error). I had eliminated this response, and then I put a reward to look for it to be answered ... without success. I recover the answer, in case someone serves as a guide to help you answer correctly, or if you are having a similar problem and some of this can help you.
  My reward remains implicit for anyone who demonstrates a proven solution (ideally explaining generalities of applying a regex in SERDEPROPERTIES). Just leave a message under this response to receive it.

     

I also add a suggested modification by Pablo Lozano , escaping the ; , that the author of the question he did not evaluate (he would not seem to be active).

     

Follow my original answer (modified with this last one) ...

  

Note: I have no idea about Hive, but I risk an answer

According to Row Formats & SerDe , SERDEPROPERTIES parameters must be passed with double quotes.

Also, to correct in your regex:

  • The semicolons ( ; ) serve to separate SERDE properties, therefore, within the expression they must escape as \; .
    More info on Hive CREATE table failed using regular experssion serde .
  • As the regex is interpreted by the Java API, the \ should be escaped as \ .
  • A point such as \. missed.
  • It was still necessary to create a group (parentheses) for game-event .
  • The last group should not generate a capture. Instead of (grupo) we use (?:grupo) to be able to repeat it, but without capturing it.
  • The parentheses of locations should be inside of the quotes.
CREATE EXTERNAL TABLE sample_xml (
 gameClock string,
 time string,
 gameEventId string,
 shotClock string, 
 locations string 
) 
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' 
WITH SERDEPROPERTIES ("input.regex"="(\d+\.\d+)″\stime=”(\d+)″\sgame-event-id=”([0-9]*)”\sshot-clock=”(\d+\.\d+)″\slocations=”((?:-?\d,-?\d+,\d+\.\d+,\d*\.\d+,\d+(?:\.\d+)?\;?)+)\s?″") 
LOCATION '/tmp/pruebas/xml';
    
answered by 08.03.2017 в 18:54