BigQuery is a fully managed, serverless data warehouse on the Google Cloud Platform infrastructure that provides scalable, cost-effective and fast analytics over petabytes of data. It is service-software that supports queries using standard SQL. In this article, I would like to mention two main techniques to make your BigQuery Data Warehouse become efficient and performant .
SQL vs NoSQL: SQL databases are table-based databases, whereas NoSQL databases can be document-based, key-value pairs, and graph databases. SQL databases are vertically scalable, while NoSQL databases are horizontally scalable. …
Improve your IT Assets, Resources and Capabilities to enhance your Business Success. The management of a company must be sure that their IT adequately supports the company’s goals. IT Management and a good IT Governance is responsible for this. The Cloud and its’ commoditization of IT assets and resources have massive impacts on the whole IT Governance. Especially smaller businesses and Start-ups can profit. How the cloud can help to meet the company’s goals, will be answered in the following article .
The figure below will show a short and superficially overview of how IT Governance is defined :
When integrating data from system A to system B, data engineers and other stakeholders should not only focus on the data process, e.g. via ETL/ELT, but also on the source system. What various circumstances must be taken into account and what I learned from earlier projects are the following:
When is a source system available? You have to consider maintenance cycles, downtimes, etc. Otherwise, if the system is not available, the data integration process will not work or only part of the data will be captured. Here, it makes sense to implement a monitoring of the source system and work…
When setting up a Big Data landscape, there are five steps and topic blocks that must be taken into account during implementation.
In order to process data in a data lake or data warehouse, to analyze it or to make it usable for other systems, data must first be made available from the source systems. Examples for sources could be:
Beside classical batch and ETL process data integration (e.g…
In the field of Data Analytics and related topics like BI, Data Science, Data Engineering etc. you often will hear about the same problems when working in a project or on a product. Here, I want to share my experiences and possible solutions.
One of the most unpleasant moments in the life of every project or product manager is when the business department complains about the data quality. The problems can be of different nature. Errors in the source system, ETL process or in the report.
Personal data is the core concept of data protection. Data protection law only applies when data relates to individuals. The GDPR for example increases fines to up to 20 million euros or, in the case of large companies and groups, up to 4% of the global group turnover of the previous year . When working in the field of Big Data, Data Science or related fields it is essential to know about these laws and how anonymization and pseudonymization give the possibility of still using the data for your use cases.
This is any information relating to an identified or…
In the world of Big Data, data visualization tools and techniques are essential to analyze large amounts of information and make data-driven decisions as data is increasingly used for important management decisions. So there is a trend away from gut feeling and emotional decisions towards rational choices that are made based on numbers. Therefore, reports and visualizations have to be easily understood and meaningful.
It is increasingly beneficial for professionals to be able to use data to make decisions and visuals to tell stories that communicate how data informs the question of person, subject, time, place, and method . In…
The File Transfer Protocol is for the communication of people and devices over the Internet and other networks works through protocols . Because FTP is an older method of data transfer, such transfers are compatible with many legacy and/or on-premises HR and business systems, making it a useful option if you want to integrate an older system with newer, cloud-based software.
In the past, most digital systems were connected via FTP integration, where one system exports data in a “flat file” format (often a spreadsheet) and another system imports the data. …
Whether it’s for university, your job, or simply as input for your next story — there are many interesting sources for free whitepapers and educational material in the field of data. With some sources you have to say that there might be a certain intention to sell a product but with the sources I use, scientific thought is mostly in the foreground. Here are my top places to go:
Everyone knows the for Dummies series. You can buy them on Amazon and in good book stores. Snowflake delivers it for free — after you have registered. Top current topics like…
Work more efficiently with the powerful BigQuery IDE powered by AI that supports Data Engineers, Scientists and BI Developers.
The Chrome Add On features :
- AI engine that optimizes your queries in real-time.
- Adaptive Caching — Never pay twice for the same query.
- Write queries faster with context-aware Smart Compose
- Execute up to 20 queries at the same time.
- Auto-Detect Standard / Legacy SQL.
- Use variables to store values and shorten your workflow.
- Visualize query results with integrated dashboards.
- Download up to 6,000,000 rows to CSV.
Big Data Enthusiast based in Hamburg and Kiel.