Schema and dataptypes
Schema definition uses SQL syntax. The supported datatypes are described in the table below with the corresping mapping in Python and Java:
| Data type | SQL name | Value type in Python | Value type in Java |
|---|---|---|---|
| ArrayType | ARRAY<element_type> | list, tuple, or array | java.util.List |
| BinaryType | BINARY | bytearray | byte[] |
| BooleanType | BOOLEAN | bool | boolean or Boolean |
| ByteType | BYTE, TINYINT | int or long | byte or Byte |
| DateType | DATE | datetime.date | java.time.LocalDate or java.sql.Date |
| DayTimeIntervalType | INTERVAL DAY, INTERVAL DAY TO HOUR, INTERVAL DAY TO MINUTE, INTERVAL DAY TO SECOND | datetime.timedelta | java.time.Duration |
| DayTimeIntervalType | INTERVAL HOUR, INTERVAL HOUR TO MINUTE, INTERVAL HOUR TO SECOND | datetime.timedelta | java.time.Duration |
| DayTimeIntervalType | , INTERVAL MINUTE, INTERVAL MINUTE TO SECOND, INTERVAL SECOND | datetime.timedelta | java.time.Duration |
| DecimalType | DECIMAL, DEC, NUMERIC | decimal.Decimal | java.math.BigDecimal |
| DoubleType | DOUBLE | float | double or Double |
| FloatType | FLOAT, REAL | int or long | float or Float |
| IntegerType | INT, INTEGER | int or long | int or Integer |
| LongType | LONG, BIGINT | int or long | long or Long |
| MapType | MAP<key_type, value_type> | dict | java.util.Map |
| ShortType | SHORT, SMALLINT | int or long | short or Short |
| StringType | STRING | string | String |
| StructType | STRUCT<field1_name: field1_type, field2_name: field2_type, …> | list or tuple | org.apache.spark.sql.Row |
| TimestampNTZType | TIMESTAMP_NTZ | datetime.datetime | java.time.LocalDateTime |
| TimestampType | TIMESTAMP, TIMESTAMP_LTZ | datetime.datetime | java.time.Instant or java.sql.Timestamp |
| YearMonthIntervalType | INTERVAL YEAR, INTERVAL YEAR TO MONTH, INTERVAL MONTH | datetime.timedelta | java.time.Period |