Return

This Week in Databend #87

April 1, 2023 · 4 min read

PsiACE

Stay up to date with the latest weekly developments on Databend!


Databend is a modern cloud data warehouse, serving your massive-scale analytics needs at low cost and complexity. Open source alternative to Snowflake. Also available in the cloud: https://app.databend.com .

What's On In Databend

Stay connected with the latest news about Databend.

Window Function

Aggregate window functions can operate on a group of rows and return a single value for each row in the underlying query. The OVER clause specifies how to partition the rows in the result set. When used with GROUP BY, aggregate window functions do not collapse rows but instead return all rows in the result set.

-- use aggregate window function
SELECT date, AVG(amount) over (partition by date)
FROM BookSold

June 21|544.0
June 21|544.0
June 22|454.5
June 22|454.5
June 23|643.0
June 23|643.0

Databend supports all aggregate functions as aggregate window functions.

Suggest Function Name on Typos

Databend has added an intelligent feature that automatically provides the closest matching item when you enter an incorrect function name.

#> select base64(1);
ERROR 1105 (HY000) at line 1: Code: 1008, displayText = error:
--> SQL:1:8
|
1 | select base64(1)
| ^^^^^^^^^ no function matches the given name: 'base64', do you mean 'to_base64'?

Code Corner

Discover some fascinating code snippets or projects that showcase our work or learning journey.

Dump Running Async Task Stack

Databend now supports dumping the running async task stacks. Simply visit http://<admin_api_address>/debug/async_tasks/dump to capture it in your browser.

Dump Running Async Task Stack

Calling the async_backtrace::taskdump_tree function can obtain information about the asynchronous task tree (collected by #[async_backtrace::framed]).

    let tree =
async_backtrace::taskdump_tree(req.map(|x| x.wait_for_running_tasks).unwrap_or(false));

Tasks are divided into regular tasks and polling tasks (marked with [POLLING]). Record the stack information for each task, and output it to a string sorted by stack depth.

    for mut tasks in [tasks, polling_tasks] {
tasks.sort_by(|l, r| Ord::cmp(&l.stack_frames.len(), &r.stack_frames.len()));

for item in tasks.into_iter().rev() {
for frame in item.stack_frames {
writeln!(output, "{}", frame).unwrap();
}

writeln!(output).unwrap();
}
}

To learn more about how it works, refer to the following link:

New Way to Integrate with Jupyter Notebook

As described in Doc | Visualization Databend Data in Jupyter Notebook, we can use Jupyter Notebook to explore data in Databend.

However, through the trick of magic, ipython-sql provides another interesting way to make SQL queries look like they are running in SQL cells and maintain seamless integration with Python.

Install Dependencies

pip install ipython-sql databend-sqlalchemy

Work with Jupyter Notebook

In [1]: %load_ext sql

In [2]: %%sql databend://{username}:{password}@{host_port_name}/{database_name}?secure=false
...: SHOW DATABASES;

In [3]: result = %%sql SELECT * FROM numbers(100);

In [4]: %matplotlib inline
...: df = result.DataFrame()
...: df.plot()

Highlights

Here are some noteworthy items recorded here, perhaps you can find something that interests you.

  • Connecting to Databend with DBeaver, either by selecting the pre-configured MySQL driver or adding the Databend JDBC driver.
  • Databend provides integration with Redash to help you gain better insights into your data. Doc | Integrations - Redash
  • Master how to display information of columns in a given table. Doc | SHOW COLUMNS
  • Databend now supports generate_series and range table functions.
  • Databend now supports the ai_embedding_vector function, which returns a 1536-dimensional f32 vectors generated by the OpenAI Embeddings API.
  • Databend added support for [CREATE | DROP | SHOW] SHARE ENDPOINT DDL.

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

Collect Metrics from Sled

sled is an embedded database. It has a metrics feature that exposed some metrics.

Databend Meta Service uses sled as the underlying Key-Value storage. We expect to obtain more metrics about sled in order to further improve observability and help with optimization.

Issue #7233 | make use of sled metrics feature for collect sled metrics

Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

New Contributors

We always open arms to everyone and can't wait to see how you'll help our community grow and thrive.

  • @rkmdCodes made their first contribution in #10269. This pull request improves the error string and makes the prompt clearer.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

Full Changelog: https://github.com/datafuselabs/databend/compare/v1.0.33-nightly...v1.0.46-nightly


🎉 Contributors
23 contributors

Thanks a lot to the contributors for their excellent work.

andylokandy
ariesdevil
b41sh
BohuTANG
dantengsky
Dousir9
drmingdrmer
everpcpc
jun0315
leiysky
lichuang
PsiACE
RinChanNOWWW
rkmdCodes
SkyFan2002
soyeric128
sundy-li
TCeason
Xuanwo
xudong963
youngsofun
zhang2014
zhyass

🎈Connect With Us

Databend is a cutting-edge, open-source cloud-native warehouse built with Rust, designed to handle massive-scale analytics.

Join the Databend Community to try, get help, and contribute!

Tags:
#weekly