ericjmorey

joined 2 years ago
MODERATOR OF
 

29 August 2024

Jonathan Carter writes:

As it stands now, bcachefs-tools is impossible to maintain in Debian stable. While my primary concerns when packaging, are for Debian unstable and the next stable release, I also keep in mind people who have to support these packages long after I stopped caring about them (like Freexian who does LTS support for Debian or Canonical who has long-term Ubuntu support, and probably other organisations that I’ve never even heard of yet). And of course, if bcachfs-tools don’t have any usable stable releases, it doesn’t have any LTS releases either, so anyone who needs to support bcachefs-tools long-term has to carry the support burden on their own, and if they bundle it’s dependencies, then those as well.

I don’t have any solution for fixing this. I suppose if I were upstream I might look into the possibility of at least supporting a larger range of recent dependencies (usually easy enough if you don’t hop onto the newest features right away) so that distributions with stable releases only need to concern themselves with providing some minimum recent versions, but even if that could work, the upstream author is 100% against any solution other than vendoring all its dependencies with the utility and insisting that it must only be built using these bundled dependencies. I’ve made 6 uploads for this package so far this year, but still I constantly get complaints that it’s out of date and that it’s ancient. If a piece of software is considered so old that it’s useless by the time it’s been published for two or three months, then there’s no way it can survive even a usual stable release cycle, nevermind any kind of long-term support.

With this in mind ... I decided to remove bcachefs-tools from Debian completely. Although after discussing this with another DD, I was convinced to orphan it instead, which I have now done. I made an upload to experimental so that it’s still available if someone wants to work on it (without having to go through NEW again), it’s been removed from unstable so that it doesn’t migrate to testing, and the ancient (especially by bcachefs-tools standards) versions that are in stable and oldstable will be removed too, since they are very likely to cause damage with any recent kernel versions that support bcachefs.

It seems that this is one more iteration of the conflict between Debian's focus on stability vs the desire to use the latest products, tool, and features.

I'm happy to see that instead of removing bcachefs-tools completely, that the package has been orphaned so it will be easier for someone to pick up maintenance of the package. I'm excited to see bcachefs get closer to becoming a mainstream filesystem, but it will take time to get there as issues like these will have to be worked through for any LTS/stability focused distribution.

 

About this course

Who is this course for?

You: Are a beginner in the field of machine learning or deep learning or AI and would like to learn PyTorch.

This course: Teaches you PyTorch and many machine learning, deep learning and AI concepts in a hands-on, code-first way.

If you already have 1-year+ experience in machine learning, this course may help but it is specifically designed to be beginner-friendly.

What are the prerequisites?

  • 3-6 months coding Python.
  • At least one beginner machine learning course (however this might be able to be skipped, resources are linked for many different topics).
  • Experience using Jupyter Notebooks or Google Colab (though you can pick this up as we go along).
  • A willingness to learn (most important).
 

Andres Vourakis writes:

Data Scientist Handbook 2024

Curated resources (Free & Paid) to help data scientists learn, grow, and break into the field of data science.

Even though there are hundreds of resources out there (too many to keep track of), I will try to limit them to a maximum of 5 per category to ensure you get the most valuable and relevant resources out there, plus, the whole point of this repository is to help you avoid getting overwhelmed by too many choices. This way you can focus less time researching and more time learning.

FAQs

  • How is curation done? Curation is based on thorough research, recommendations from people I trust, and my years of experience as a Data Scientist.
  • Are all resources free? Most resources here will be free, but I will also include paid alternatives if they are truly valuable to your career development. All paid resources include the symbol 💲.
  • How often is the repository updated? I plan to come back here as often as possible to ensure all resources are still available and relevant and also to add new ones.
 

Book Description

Writing a C Compiler will take you step by step through the process of building your own compiler for a significant subset of C—no prior experience with compiler construction or assembly code needed. Once you’ve built a working compiler for the simplest C program, you’ll add new features chapter by chapter. The algorithms in the book are all in pseudocode, so you can implement your compiler in whatever language you like. Along the way, you’ll explore key concepts like:

  • Lexing and parsing: Learn how to write a lexer and recursive descent parser that transform C code into an abstract syntax tree.
  • Program analysis: Discover how to analyze a program to understand its behavior and detect errors.
  • Code generation: Learn how to translate C language constructs like arithmetic operations, function calls, and control-flow statements into x64 assembly code.
  • Optimization techniques: Improve performance with methods like constant folding, dead store elimination, and register allocation.

Compilers aren’t terrifying beasts—and with help from this hands-on, accessible guide, you might even turn them into your friends for life.

Author Bio

Nora Sandler is a software engineer based in Seattle. She holds a BS in computer science from the University of Chicago, where she researched the implementation of parallel programming languages. More recently, she’s worked on domain-specific languages at an endpoint security company. You can find her blog on pranks, compilers, and other computer science topics at https://norasandler.com/.

 

July 17, 2024

Allen B. Downey writes:

Elements of Data Science is an introduction to data science for people with no programming experience. My goal is to present a small, powerful subset of Python that allows you to do real work with data as quickly as possible.

Part 1 includes six chapters that introduce basic Python with a focus on working with data.

Part 2 presents exploratory data analysis using Pandas and empiricaldist — it includes a revised and updated version of the material from my popular DataCamp course, “Exploratory Data Analysis in Python.”

Part 3 takes a computational approach to statistical inference, introducing resampling method, bootstrapping, and randomization tests.

Part 4 is the first of two case studies. It uses data from the General Social Survey to explore changes in political beliefs and attitudes in the U.S. in the last 50 years. The data points on the cover are from one of the graphs in this section.

Part 5 is the second case study, which introduces classification algorithms and the metrics used to evaluate them — and discusses the challenges of algorithmic decision-making in the context of criminal justice.

This project started in 2019, when I collaborated with a group at Harvard to create a data science class for people with no programming experience. We discussed some of the design decisions that went into the course and the book in this article.

Read Elements of Data Science in the form of Jupyter notebooks.

 

Book Preface:

Welcome to Apache Iceberg: The Definitive Guide! We’re delighted you have embarked on this learning journey with us. In this preface, we provide an overview of this book, why we wrote it, and how you can make the most of it.

About This Book

In these pages, you’ll learn what Apache Iceberg is, why it exists, how it works, and how to harness its power. Designed for data engineers, architects, scientists, and analysts working with large datasets across various use cases from BI dashboards to AI/ML, this book explores the core concepts, inner workings, and practical applications of Apache Iceberg. By the time you reach the end, you will have grasped the essentials and possess the practical knowledge to implement Apache Iceberg effectively in your data projects. Whether you are a newcomer or an experienced practitioner, Apache Iceberg: The Definitive Guide will be your trusted companion on this enlightening journey into Apache Iceberg.

Why We Wrote This Book

As we observed the rapid growth and adoption of the Apache Iceberg ecosystem, it became evident that a growing knowledge gap needed to be addressed. Initially, we began by sharing insights through a series of blog posts on the Dremio platform to provide valuable information to the burgeoning Iceberg community. However, it soon became clear that a comprehensive and centralized resource was essential to meet the increasing demand for a definitive Iceberg reference. This realization was the driving force behind the creation of Apache Iceberg: The Definitive Guide. Our goal is to provide readers with a single authoritative source that bridges the knowledge gap and empowers individuals and organizations to make the most of Apache Iceberg’s capabilities in their data-related endeavors.

What You Will Find Inside

In the following chapters, you will learn what Apache Iceberg is and how it works, how you can take advantage of the format with a variety of tools, and best practices to manage the quality and governance of the data in Apache Iceberg tables. Here is a summary of each chapter’s content:

  • Chapter 1, “Introduction to Apache Iceberg”
    Exploration of the historical context of data lakehouses and the essential concepts underlying Apache Iceberg.
  • Chapter 2, “The Architecture of Apache Iceberg”
    Deep dive into the intricate design of Apache Iceberg, examining how its various components function together.
  • Chapter 3, “Lifecycle of Write and Read Queries”
    Examination of the step-by-step process involved in Apache Iceberg transactions, highlighting updates, reads, and time-travel queries.
  • Chapter 4, “Optimizing the Performance of Iceberg Tables”
    Discussions on maintaining optimized performance in Apache Iceberg tables through techniques such as compaction and sorting.
  • Chapter 5, “Iceberg Catalogs”
    In-depth explanation of the role of Apache Iceberg catalogs, exploring the different catalog options available.
  • Chapter 6, “Apache Spark”
    Practical sessions using Apache Spark to manage and interact with Apache Iceberg tables.
  • Chapter 7, “Dremio’s SQL Query Engine”
    Exploration of the Dremio lakehouse platform, focusing on DDL, DML, and table optimization for Apache Iceberg tables.
  • Chapter 8, “AWS Glue”
    Demonstration of the use of AWS Glue Catalog and AWS Glue Studio for working with Apache Iceberg tables.
  • Chapter 9, “Apache Flink”
    Practical exercises in using Apache Flink for streaming data processing with Apache Iceberg tables.
  • Chapter 10, “Apache Iceberg in Production”
    Insights into managing data quality in production, using metadata tables for table health monitoring and employing table and catalog versioning for various operational needs.
  • Chapter 11, “Streaming with Apache Iceberg”
    Use of tools such as Apache Spark, Flink, and AWS Glue for streaming data processing into Iceberg tables.
  • Chapter 12, “Governance and Security”
    Exploration of the application of governance and security at various levels in Apache Iceberg tables, such as storage, semantic layers, and catalogs.
  • Chapter 13, “Migrating to Apache Iceberg”
    Guidelines on transforming existing datasets from different file types and databases into Apache Iceberg tables.
  • Chapter 14, “Real-World Use Cases of Apache Iceberg”
    A look at real-world applications of Apache Iceberg, including business intelligence dashboards and implementing change data capture

Direct link to PDF

Dremio bills itself as a "Unified Analytics Platform for a Self-Service Lakehouse". The authors of the book work for Dremio and may have ownership interest in Dremio.

 

What issues or frustrations have you encountered in trying to use and set up Neovim in Windows 11?

I'm currently writing up my experience with installing, setting up, and using Neovim in Windows and would like to hear from others that have tried the same. What was annoying, difficult, or impossible in your experience?

 

Based on answers to the following question:

Which development environments did you use regularly over the past year, and which do you want to work with over the next year? Please check all that apply.

Neovim is the most admired code editor in the 2024 Stacked Overflow Developer Survey

Source: https://survey.stackoverflow.co/2024/technology#admired-and-desired-new-collab-tools-desire-admire

 

It's broader than a Neovim specific mapping, I've changed the system keyboard mapping of <Caps Lock> to <Esc> and <F9> to <Caps Lock>.

I think mapping <Caps Lock> to <Esc> isn't uncommon for Neovim users. But I like having <Caps Lock> available for non Neovim purposes.

 

I asked some LLM chatbots to give me some silly ideas to try. Below are a few of my favorite responses.


From Perplexity.ai

Six Degrees of Wikipedia: Creating a program that finds the shortest path between two random Wikipedia articles using graph traversal algorithms. This applies graph theory concepts to explore connections in a large knowledge base.

Emoji Encryption: Using hash tables and cryptographic algorithms to create an encryption system that converts text to emojis. This could be an interesting way to explore cryptography concepts in a fun, visual way.


From Gemini.google.com

Procrastination Station: This website creates increasingly elaborate and ridiculous tasks to distract you from what you actually need to do. Dishes? Nah, fold your socks into origami cranes!

Dramatic Password Validator: Forget boring error messages. This program rejects weak passwords with Shakespearean insults or movie villain monologues.


From Chatgpt.com

  1. Time Travel Email Service: Build a data structure that allows you to send emails to yourself in the past, with time complexity considerations that are totally ignored because it’s time travel.
  1. Mood-Driven Random Number Generator: Implement an algorithm that generates random numbers based on the mood of the user, using sentiment analysis on real-time facial expressions.
 

July 18, 2024 Narek Galstyan writes:

We were naturally curious when we saw Pinecone's blog post comparing Postgres and Pinecone.

In their post on Postgres, Pinecone recognizes that Postgres is easy to start with as a vector database, since most developers are familiar with it. However, they argue that Postgres falls short in terms of quality. They describe issues with index size predictability, index creation resource intensity, metadata filtering performance, and cost.

This is a response to Pinecone's blog post, where we show that Postgres outperforms Pinecone in the same benchmarks with a few additional tweaks. We show that with just 20 lines of additional code, Postgres with the pgvector or lantern extension outperforms Pinecone by reaching 90% recall (compared to Pinecone's 60%) with under 200ms p95 latency.

Read Postgres vs. Pinecone

[–] ericjmorey@programming.dev 0 points 2 years ago* (last edited 2 years ago)

Building from source is the opposite of hacky. It's the recommended way to deal with things like this where you are concerned about trust and security. I understand that it's not something you've done before, but it not as complicated as it sounds. There are many tutorials on how to build programs from source.

I understand that providing official packages for fedora/rhel, Ubuntu/debian, and arch-based distro packages along with a flatpack and Appimage would make a lot of sense, but for whatever reason, signal has decided not to. Perhaps you can message the signal team to ask why they choose not to do this.

[–] ericjmorey@programming.dev 0 points 2 years ago (1 children)

I'm not sure how cononical is connected to this.

[–] ericjmorey@programming.dev 0 points 2 years ago (3 children)

Disinvestment into Python, Flutter, and Dart is a clear signal that those tools are unimportant to Google. I won't be recommending that anyone use Dart or Flutter on new projects.

[–] ericjmorey@programming.dev 2 points 2 years ago (2 children)

I also did not create this.

[–] ericjmorey@programming.dev 1 points 2 years ago (1 children)

This is original work. The source is in the post.

[–] ericjmorey@programming.dev 7 points 2 years ago (4 children)

Randall Monroe has provided me with weekly nibbles of entertainment for nearly 2 decades. But this was inspired by his style, not created by him.

[–] ericjmorey@programming.dev 2 points 2 years ago (1 children)

You want to talk about it?

[–] ericjmorey@programming.dev 0 points 2 years ago (2 children)

I bet this is directly related to ChatGPT

view more: next ›