Documentation Index
Fetch the complete documentation index at: https://upstash-fix-issues-on-docs.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
StarTree provides a fully managed, Apache Pinot based
real-time analytics database on its cloud environment.
Upstash Kafka Setup
Create a Kafka cluster using Upstash Console or
Upstash CLI by following
Getting Started.
Create one topic by following the creating topic
steps. This topic will be the
source for the Apache Pinot table running on StarTree. Let’s name it
“transcript” for this example tutorial.
StarTree Setup
To be able to use StarTree cloud, you first need to
create an account.
There are two steps to initialize the cloud environment on StarTree. First, you
need to create an organization. Next, you need to create a workspace under this
new organization.
For these setup steps, you can also follow
StarTree quickstart.
Connect StarTree Cloud to Upstash Kafka
Once you created your workspace, open Data Manager under the Services section
in your workspace. Data Manager is where we will connect Upstash Kafka and work
on the Pinot table.
To connect Upstash Kafka with StarTree, create a new connection in Data Manager.
As the connection type, select Kafka.
In Kafka connection settings, fill the following options:
-
Connection Name: It can be anything. It is up to you.
-
Broker Url: This should be the endpoint of your Upstash Kafka cluster. You can
find it in the details section in your
Upstash Kafka cluster.
-
Authentication Type:
SASL
-
Security Protocol:
SASL_SSL
-
SASL Mechanism:
SCRAM-SHA-256
-
Username: This should be the username given in the details section in your
Upstash Kafka cluster.
-
Password: This should be the password given in the details section in your
Upstash Kafka cluster.
To proceed, you need to test the connection first. Once the test connection is
successful, then you can create the connection.
Now you have a connection between Upstash Kafka and StarTree Cloud! The next
step is to create a dataset to store data streamed from Upstash Kafka.
Let’s return to the Data Manager overview page and create a new dataset.
As the connection type, select Kafka again.
Now you can select the Kafka connection you created for connecting Upstash
Kafka.
In the next step, you need to name your dataset, provide the Kafka topic to be
the source of this new dataset and define the data format. We can give
“transcript” as the topic and select JSON as the data format.
To proceed to the next step, we must first produce a message in our Kafka topic.
StarTree doesn’t allow us to go to the next step before it validates the
connection is working, and data is being streamed correctly.
To make StarTree validate our connection, let’s turn back to the Upstash console
and create some events for our Kafka topic. To do this, click on your Kafka
cluster on Upstash console and go to the “Topics” section. Open the source
topic, which is “transcript” in this case. Select the Messages tab, then click
Produce a new message. Send a message in JSON format like the one below:
{
"studentID": 205,
"firstName": "Natalie",
"lastName": "Jones",
"gender": "Female",
"subject": "Maths",
"score": 3.8,
"timestampInEpoch": 1571900400000
}
Now go back to the dataset details steps on StarTree Data Manager.
After you click next, StarTree will consume the message in the source Kafka
topic to verify the connection. Once it consumes the message, the message will
be displayed.
In the next step, StarTree extracts the data model from the message you sent.
If there is any additional configuration about the model of the data coming from
the source topic, you can add it here.
To keep things simple, we will click next without changing anything.
The last step is for more configuration of your dataset. We will click next
again and proceed to review. Click “Create Dataset” to finalize the dataset.
Query Data
Open the dataset you created on StarTree Data Manager and navigate to the query
console.
You will be redirected to Pinot Query Console running on StarTree cloud.
When you run the following SQL Query, you will see the data that came from
Upstash Kafka into your dataset.
select * from <Dataset-name> limit 10