Athena query result management

Amazon Athena is a serverless interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL
It is a "pay as you go service", you pay only for the queries that you run

Athena uses Presto under the hood to run queries. Presto is a distributed query engine for big data using the SQL query language.
Athena also uses Apache Hive. Apache Hive gives an SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop

Athena stores query results and metadata information for each query that runs automatically.
These results are stored in an S3 bucket that we can specify.
We can also encrypt the files if required.
Athena supports DML, and DDL queries.
Files are saved based on the name of the query, and the query ID. QueryID is a unique identifier that Athena assigns to each query when it runs. I.e: 0000e975-9f92-4adb-b853-71fc97dac20e.csv
For each query, the result will contain two files
1. Actual data
2. Metadata
DML query results are saved (CSV) format.
DDL query results are saved as plain text files.
Metadata files are saved in binary format.

DDL (Data Definition Language) queries include CREATE TABLE and ALTER TABLE ADD PARTITION.
There are no charges for DDL queries.
DML (Data Manipulation Language) queries include SELECT, CREATE TABLE AS (CTAS), and INSERT INTO queries.
DML queries are charged for the number of bytes scanned rounded up to the nearest megabyte

We can directly access/download the files from Athena Console/S3.
We can use AWS CLI / API to get the query results from Athena based on the unique QueryIds.
To access and view query output files, IAM principals (users and roles) need permission for the Amazon S3 GetObject action for the query result location, as well as permission for the Athena GetQueryResults action.
If the location is encrypted, users must have the appropriate key permissions to encrypt and decrypt the query result location.

Athena keeps a query history for 45 days.
CloudTrail keeps the log for 90 days.
But the Athena query results are kept forever.
Over a period of time, the query results get accumulated and incur a cost when kept in S3.

You can safely delete the old files as long as you don't want to go back and review the results of that particular query.
But keep in mind once the data is deleted it can’t be restored.
Also while trying to access the result will throw a "Could not find results" error.
We can always run the query to get the results again.
For any reason such as unexpected Athena usage, if we want the query results we will not be able to get them.

Use Athena Saved queries wherever possible.
Create and maintain a separate bucket for Athena query results.
Setup an S3 lifecycle to maintain the query results.
Set Up a monitoring system for collecting and storing the Athena logs periodically into the S3 bucket from Cloud Trail.

GURURAJAN's Blog